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Abstract (hide during talk) (th) =. 


Although the concept of (logically and thermodynamically) reversible 
computation was first shown to be theoretically coherent by Bennett 
46 years ago, relatively little attention has been paid to this concept 
over the decades, in terms of its potential to develop into a viable ap- 
proach towards making computers more energy efficient in practice. 
However, technological trends have now brought us to the point 
where beginning to increase the degree of reversibility of our compu- 
tational operations, at all levels from device physics to architectures, 
will soon be the only remaining way to make substantial further im- 
provements to the energy efficiency of general digital computation, 
with no fundamental limits to that efficiency yet known. Many of the 
apparent barriers to making this concept practical have already been 
swept aside by the relatively small amount of research has been done 
to date. In this talk, | review the progress that has been made in this 
area so far, as well as the issues that remain to be addressed, and 
argue that this technology direction needs to become a major focus of 
our long-term R&D efforts looking forwards. 


Abstract / Outline of Talk (hy) 


= Widespread perception today that we are approaching limits 
on intrinsic energy efficiency for general digital computation 
= This perception is correct—but only for conventional irreversible logic 


= An alternative paradigm for digital computation called 
reversible computing that can circumvent these limits has 
been known for ~46 years now... 
= But, little attention paid to it so far as a viable path forward... 


=" We should start paying more attention to it soon, because: 
= Many apparent barriers to its practicality have been demolished 
= Difficult (not insurmountable!) challenges remain to be addressed 
= |t will soon be the only remaining way forward for general digital 
= Need to ramp up research now to have full solutions when needed! 
= The potential upside from this technology is almost unlimited... 


We’re picking low-hanging fruit... i. 


The “golden apple” of 
reversible computing is 
difficult to reach, but it 
offers us the greatest & 
most beautiful long-term 
future for computing... 


Semiconductor Roadmap is Ending... (Hz 


Laboratories 
a Thermal noise on gates of Data source: International Technology Roadmap for Semiconductors, 2015 edition 
minimum-width segments of ITRS2015 %4CV* Node Energy vs. Gate Energy =™|TRS FO3 node energy 
FET gates leads to channel PES 1000000 + : 


= ITRS gate energy (est.) 
fluctuations when E, S 1-2 eV 
= Increases leakage, impairs ' : 
practical device performance 100000 - (includes Circuit- 


le | | 
= Thus, ITRS has minimum gate vel Over i1 keV 
energy asymptoting to ~2 eV ! 


1 f) 


= Also, real logic circuits incur 10000 + 
many further overhead factors: 
=" Transistor width 10-20 X min. 
1000 


" Parasitic (junction, etc.) transistor 
capacitances (~2 X ) 


= Multiple (~2) transistors fed by 
each input to a given logic gate 

= Fan-out to a few (~3) logic gates 

= Parasitic wire capacitance (~2 X ) 

= Due to all these overheads, the 
energy of each bit in real logic 


circuits is many times larger 4 | 
than the min.-width gate energy 
= 375-600 X (!) larger in ITRS’15 


= .*, Practical bit energy for irreversible 
logic asymptotes to ~1 keV! 


= Practical, real-world logic circuit 
designs can’t just magically cross 
this ~500 X architectural gap! 
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a 2, ‘ThepmailinainleIiailte/inadle Only reversible computing can take us from ~1 keV at the 
much /arger practical limits! end of the CMOS roadmap, all the way down to < kT. 


= The end is near! 


Implications for FLOPS & power (Fi) i 


Laboratories 
Note: The limits suggested by the diagonal lines do not even 


What would it 
| 1 | take fora 
include power overheads for interconnects, memory, or cooling! [a 
v M oiRS Jlal Oy 7 OWer LeEVe!S! , 
GW LE+03 >1GW in 2030 
—# 2015 ITRS 
HPC Evolution >1MW near 


~ thermal noise 


iW Leas: | —s-os over i bea” 
—g— 2030 min. gate E 


3 10s of kW 
£ 7 
o at Landauer 
= —@— Landuaer 
o 1.£+03 
im kW Top 100 Supercomputers 
LF 
s 
s ts sa 
= Wi 1E+00 
OC 
The “Forever 
mW 1£-05 te” Forbidden Zone” 
for All Irreversible 
Computing 
UW 1.£-06 
100000 1E+09 1E+12 1E+15 1F+18 1E+21 
MFLOP/'s GFLOP/s TFLOP/s PFLOP/s EFLOP/'s ZFLOP/s 


FLOPS,'s 


Sandia 


Reversible Computing — What? Why? ms. 
= Fundamental microphysics is reversible—it conserves information! 
" Therefore, losing information from a digital system (by erasing/overwriting it) 
necessarily implies ejecting that information into the system’s environment 


=" Once thermalized by the environment, information that was previously known 
(correlated) becomes entropy (unknown/uncorrelated information) 


— ...and this implies dissipation of kT In 2 of organized energy (work) to heat at 
temperature T per bit of information lost (Landauer’s Principle) 
| 


Unfortunately, in the conventional (irreversible) computing paradigm, 
we discard computational information all the time... 


Every active conventional logic gate destructively overwrites its output node 
on every clock cycle, losing the information embodied in the previous output 
= Similarly for line drivers, on every bus cycle for every interconnect wire 
= And for memory cells/lines, every time a cell is written, read out or refreshed 
How can we compute without losing information? (And please note that 
“computing” includes driving interconnects, accessing memory, etc. as needed!) 
= Reversibly transform states, instead of destructively overwriting them! 


= This then allows avoiding the Landauer principle’s limit on energy efficiency 


There is no known fundamental (technology-independent) limit on computational 


energy efficiency, but only if the reversible computing principle is used! 


International Roadmap for Devices qe. 
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and Systems (IRDS), 2017 edition — ssas.sccc.ors 


= Beyond CMOS chapter (123 pp., 1079 refs.) 


=" Sec. 5, Emerging Device-Architecture 
Interaction 


=" Focuses on unconventional computing 
paradigms (besides quantum) 


— (Quantum computing is in a new 
chapter in the 2018 edition) 


= Arelevant paragraph from the 
section’s Introduction is quoted below... 


— |’llelaborate on some of these points. 


Reversible (adiabatic and/or ballistic) computing (§5.4) — Computing paradigms that approach logical and physical 
reversibility offer the potential to greatly exceed the energy efficiency of all other approaches to general-purpose digital 
computation. Primitive devices for reversible computing may include devices having fairly conventional functions (such 
as switches or oscillators). These devices would need to be optimized differently to use quasi-reversible physical 
processes such as near-adiabatic state transitions, near-ballistic signal propagation, highly elastic interactions, and highly 
underdamped oscillations. Reversible devices also must be organized into circuits and architectures in tightly constrained 
ways, for reversibility at the logical as well as physical level.'°° Careful fine-tuning and optimization of analog circuit 
characteristics (e.g., resonator quality factors or elasticity of ballistic interactions) remains a difficult and crucially 
important engineering challenge that must be met in order for this paradigm to realize its promise. 


Basic Physics of Computing Issues Ms. 


= Sadly, the literature is full of fundamental misunderstandings 
about Landauer’s principle and reversible computing theory... 
= Constantly, people are generating false “disproofs” of these concepts... 
= Simply beating back all of the misinformation would be a full-time job! 
"=" Work in progress (with K. Shukla): Correctly formulating the basic 
thermodynamics of computation and reversible computing theory 
in the language of nonequilibrium quantum thermodynamics 


= Landauer’s principle itself can already be proved from quantum stat. mech. 
without making any essential equilibrium assumptions... However: 


= Reformulating a complete theory of Landauer’s principle/reversible computing 
in standard non-equilibrium language should substantially help dispel confusion 


— E.g., the precise role of the fluctuation/dissipation theorem w.r.t. the limits of 
general real machines should be more carefully & thoroughly addressed 


= Also needed: A fu/l/ quantum-mechanical model of reversible 
computing. (Self-contained, complete, realistic, buildable.) 


=" Need to explore fundamental physical phenomena that could be used to 
suppress the tendency towards chaos in conservative dynamical systems 


Rigorous One-Slide Landauer Proof! Ms. 


= Let X,Y be any two subsystems of a computer. 
# Joint probability distribution P(X, Y), joint entropy H(X,Y). 
= Mutual information def’'d: 1(X;Y) = H(X) + H(Y) — A(X, Y). 
= Define independent entropy in Y as the rest of Y’s entropy: 
Sina(Y) = ACY) — I(X;Y) = ACY|X), 
= Now, consider erasing Y via any oblivious physical mechanism... 
= Meaning, set H(Y) = 0 w/o reference to X or any other info. about Y 
= Can try to “reverse” the erasure process to restore the old H(Y)... 
= But now, /(X;Y) = 0 (any correlations have become lost!) 
= Sina) = AY), + ASingW) = 1X3 Yorig = AStot 
= If originally Y was (deterministically) computed from X, then: 
= H(Y|X) = 0,i.e., Sinq(Y) = 0, s0 H(Y) = 1(X;Y). 


= Apparent entropy of all computed bits is actually entirely mutual information! 
— a.k.a. “information-bearing entropy” in Anderson’s terminology 


= Independent entropy (and total universe entropy!) has increased by 
AStot = ASinal¥) = I(X; ry = H(Y) Q.E.D.! = 


Computer Science issues (rh) Ee... 


= The CS research community (within its reversible computation 
subfield) already has begun to address these topics, but more 
work is still needed in a number of important areas, such as: 
= More space/time efficient reversible algorithms for important problems 
= Broadening reversible logic theory & synthesis efforts to include more 
general classes of models of reversible computation, including: 
=" Generalized (Conditional) Reversible Computing (topic of my RC17 paper) 
— Appropriate for adiabatic circuit design; c.f. collab. with Wille & Zulehner 
= Asynchronous (Ballistic) Reversible Computing (topic of my ICRC17 paper) 
— Basis of a$1.5M internal superconducting circuit design effort at Sandia 
= Hardware description languages for adiabatic/reversible circuit design 
" Ongoing dialogue with Wille/Zulenner @ JKU, & Perumalla @ ORNL 
= Systems engineering of novel computer architectures that trade off 
energy savings via reversibility vs. realistic cost metrics in key areas 
including hardware efficiency and serial performance, while accounting 
for real nonidealities and parasitic losses 
= This is one is more engineering than CS, but it is nevertheless essential! 


A bit of history... (hE. 


= MIT Pendulum Project See nee Pendulum 


= Led by Tom Knight w. Norm 
Margolus, 1997-99 


= Used adiabatic, reversible 


Fi 
First Fabbed First Adiabatic Adiabatic 


SCRL logic family invented CPUwitha FPGA RAM Adiabatic 


First Fully 
Reversible ISA CPU 


by Younis & Knight in 1994 
= Their prior CRL family (‘93) established that arbitrary (pipelined, 
sequential) digital logic is doable with reversible adiabatic switching. 
= The Pendulum project developed several fully reversible 
processor chips, which | helped to design... 
= Proof-of-concept designs, not highly optimized, bug fixes needed 


= BUT: Demonstrated that reversibility does not pose any fundamental 
barrier to computer architecture for general-purpose computation 


Device Technology Issues (H, 


=" Some of the recently-active research areas and 
groups in terms of device technologies for 
reversible computing include: 


=" Reversible adiabatic superconducting logics 


= nSQUID logic of Vasili Semenov (& student Jie Ren) at 
SUNY Stonybrook 
— Results near kT, but line of work is not currently active 


= RQFP group at Yokohama National University (Japan) 
— N. Takeuchi, T. Yamae, Y. Yamanashi, N. Yoshikawa 
— Have simulations below kT In 2, working test chip 
=" Nanomechanical rod logic (a.k.a. rotary link logic) 
= Ralph Merkle and colleagues at IMM 
— Improving upon old (91-92) work by K. Erik Drexler 
=" More large-scale modeling still needed 
= Still very far from manufacturability E 
=" Quantum-Dot Cellular Automata (QDCA), Notre Dame aa Ve 
= There are a few scattered others, but this field is not = 
very well unified/coherent... 


=" Needed: A workshop dedicated to device & circuit 
engineering for reversible computing! QCADesigner screenshot showing a simple 4-bit processor layout. 


Existing Energy-Delay Comparison 
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Energy & delay for full adder cell 
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Nanomechanical Rod Logic (rh) ee. 


Merkle et al., IMM Rep. 46 and arxiv:1801.03534; 


Hogg et al., Mol. Sys. Des. & Eng., DOI: 10.1039/C7ME00021A conan 
oe aimee 7 C9585 -0.10 
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Rod Logic Lock Operation me 


Matt Moses, https://youtu.be/-YPeXK2PTPA 
= Videos animate schematic 
geometry of a pair of locks 

in a shift register 


=" Anda rotating cam wheel 
driver 


= Below: An example of a 
machinable test structure 


Energy-Delay, CMOS vs. Rod Logic (Fi) fe, 


Energy & delay, CMOS vs. rod logic 


(Hogg eft al. 17, Mol. Sys. Des. & Eng.) 1E-14 
= Rod logic dissipation was simulated ina 16415 tring osc) 
careful analysis based on fluctuation- iei6 jou 


= 
m 
' 
rR 
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dissipation relations 
=" Molecular Dynamics modeling/simulation 
tools used for analysis include: 
= LAMMPS, GROMACS, AMBER Antechamber 


1E-18 
LE=19 


1E-20 


kT @ T=300K 


= Simulated dissipation: 


» ~4X 10°76 J/cycle at 100 MHz 
= Note this is 74,000 X below the Landauer 
limit for irreversible ops! 


= Note also: ; 
than end-of- 1E-25 


roadmap CMOS. 
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Energy Dissipation Per Signal Per Cycle 
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1E-26 
=" Speeds into GHz range should be achievable. _ 
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Adiabatic Reversible Computing es 
A general class of implementation techniques for reversible 


computing that relies on controlled adiabatic transformations of 
the information-bearing degrees of freedom. | 


=" Has been explored in various physical systems: VU 
= Superconducting electronics (Likharev ‘77, etc.) 
= LC switching circuits (Fredkin & Toffoli ’78) | As 
# Adiabatic CMOS (Seitz ’85, etc.) » \ je LY 
=" Molecular nanomechanical logic (Drexler ’91, etc.) 1 \A/ 
= Single-electron quantum dots (Lent ‘92, etc.) . elas 


= Some drawbacks of this class of approaches: +> 
= Every logic transition must be explicitly driven by a power-clock 
= Numerous clocks are required in combinational and sequential designs 
= Substantial design complexity overhead to distribute clocks to every gate 
=" Challenging to design finely-tuned, high-Q power-clock resonators 


=" Problems with load balancing in long-range global clock distribution 
networks with large parasitics, avoiding data-dependent back-action 


Conditionally-Reversible Boolean He. 
Logic in Adiabatic CMOS Circuits Dx” 


= This simple CMOS structure can be used to 
do/undo latched reversible rOR operations A@1 
=" Example of 2LAL logic family (Frank ’00) 
= Based on CMOS transmission gates 
= Uses dual-rail complementary signals (PN pairs) Be 


= Similar to orig CRL family of Younis & Knight ‘93 NP 
=" Computation sequence: eo 
Precondition: Output signal Q is initially at logic 0 
By design, driving signal D is also initially logic 0 @1 
1. Attime 1(@1), inputs A, B transition to new levels la 


=" Connecting D to Q if and only if A or B is logic 1 
2. Attime 2 (@2), driver D transitions from 0 to 1 


= Q follows it to 1 if and only if A or B is logic 1 Byp ae 


= Now Qis the logical OR of inputs A,B 
= Reversible things that we can do afterwards: Dy 


= Restore both A, B to 0 (latching Q in place), or, Qn| __ AEA 


= Undo above sequence (decomputing Q back to 0) 


2LAL Shift Register Structure )e. 


= 1-tick delay per logic stage: Animation: http://y2u.be/c18mDIOq11Q 


br bes 
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HY He (F (Ra. 
Do OF Do 3 


= Logic pulse sa and signal propagation: 
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moo 


inX_Z] CX ZT 
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Simulation Results (Cadence/Spectre) mes 


National 
Power vs. freq., TSMC 0.18, Std. CMOS vs. 2LAL 
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=" Graph shows per-FET power 
2LAL = Two-level adiabatic logic (invented at UF, ‘00) dissipation vs. frequency 
1.F-05 = jn an 8-stage shift register. 
= At moderate freqs. (1 MHz), 
= Reversible uses < 1/100" the 
power of irreversible! 
= At ultra-low power levels 
(1 pW/transistor) 
= Reversible is 100 X faster than 
irreversible! 
= Minimum energy dissipation 
per nFET is < 1 electron volt! 
= 500 X /ower dissipation than 
best irreversible CMOS! 
= 500 X higher computational 
energy efficiency! 
=" Energy transferred per nFET 
per cycle is still on the order 
of 1-10 fJ (10-100 keV) 
= So, energy recovery efficiency 
is at least 99.99%! 
" Quality factor Q > 10,000! 


— Note this does not include any of 
the parasitic losses associated 
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Design Automation for Adiabatic Circuits (ii, 
Collab. w. Wille & Zulehner (JKU), to be presented at ASP-DAC ‘19, Tokyo 
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Fig. 1. Transmission gate for dual-rail signals 
A 


B 
(a) OR gate 
Fig. 2. Adiabatic gates 
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0 012 3 4 
(a) Transmission gates (b) Clocks 
Fig. 7. Buffer element for baie ine circuits 
(a) AND-Inverter graph (b) OR-Inverter graph 1 At—1 — 1 1 3 + B):- 7 aes 
Fig. 3. Graph representations for Boolean functions 
; | 5 $3 
' $2 
H ; ; o1 
i ry do 0 
‘ 12345 9 At ” (A + B)t 
(a) Circuit ” (b) Power docks (a) Computing OR “re ae ae 


Fig. 4. Synthesized retractile circuit Fig. 8. OR gate for fully-pipelined circuits 


Resonant Energy-Recovering Power oe 
Supplies for Adiabatic Circuits 


An extremely nontrivial, and extremely under-emphasized 
engineering challenge! 


= All existing adiabatic schemes for reversible computing (including 
the superconducting ones!) rely on a (typically unspecified) 
external system to deliver precisely-conditioned AC waveforms 
to drive their adiabatic transitions... 


= Ignoring the problem of how to design these systems to work efficiently 
(as almost everyone in the adiabatic circuits field does!) essentially just 
sweeps the entire real energy dissipation problem under the rug! 
= It’s extremely difficult to design a supply that actually recovers almost the 
entire signal energy... Engineering-wise, this is almost the entire problem! 

— We already know (ever since Younis & Knight’s CRL, 1993) in principle how to 
design fully-adiabatic switching circuits; that’s not even the hard part... It’s the 
energy recovery part that’s difficult! 

=" Caveat: For the special case of cryogenic systems that dump small signal 
energies to a room-temperature environment, the problem is less serious. 


Spectrum of Trapezoidal Wave me. 


= Relative to mid-level crossing, waveform is an odd function 
= Spectrum includes only odd harmonics f, 3/, 5/, ... 


= Six-component Fourier series expansion is shown below 
=» Maximum offset with 11f frequency cutoff is < 1.7% of Vag 
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Resonator design effort, in progress... je 
) eee (fh) aa ; 
oratories 
Funding source: DOE ASC (Advanced Simulation and Computing) program 
= Goal of this effort: 
=" Design & validate in simulation (and, stretch goal: with a physical prototype) a high-efficiency resonant 
oscillator (for low-to-medium RF frequencies) that approximates a trapezoidal output voltage waveform 
= Initial design concept: 
=" Coupled assemblage of LC tank circuits with resonant frequencies corresponding to odd multiples of the 
fundamental frequency, excited in the right relative amplitudes to approximate the target wave shape 
= Some detailed requirement specifications: 
« Initial target operating point: 230 kHz, 1.8V (optimal point for minimum dissipation in the UF study) (MET.) 
= Explore a wider range of parameter values as the project proceeds 
=" Tops and bottoms of trapezoidal wave should be within <5% of flatness throughout % clock period. (MET.) 
=" The 10-90% rise/fall time should be between 75 & 100% of its nominal value (80% of 1/4 clock period) (MET.) 
= Efficiency goals: 
=" Quality factor of resonator during unpowered ring-down should be 21,000. (MET. Measured value: ~19,550.) 


Total energy dissipation per cycle during steady-state powered operation should be <1% of magnetically-stored 
energy in the resonator, when the oscillator is running in isolation. 


Total energy dissipation per cycle during steady-state powered operation should be <10% of the capacitively- 
stored energy on an appropriately-sized model (RC) load, when the oscillator is coupled to the load. 


= Anumber of significant design challenges that have been encountered so far: 
=" How to tune the relative amplitudes of the component resonant modes (Solved.) 
=" How to prevent phase drift and transfer of energy between modes (Solved.) 
«  Identifying/tailoring components to have precise-enough L, C values 
= Designing a driver circuit that meets efficiency goals during steady-state operation 
= We have already solved a number of the problems encountered, but still have a ways to go... 
=" We have only spent 1 year/$250 K on this effort so far. 
= Budget increased to $300K for next FY. > Goal for next FY: Get to a publishable result. 


Superadiabatic Scaling of Efficiency (#:.. 


Can we do better than 
linear scaling of energy 
with speed? —> YES! 
= Observations from Pidaparthi 
& Lent, 2018 > 
Landau-Zener ’32 (!) formula 
for quantum transitions in 


atomic scattering problems shows 10S, 
that the probability of exciting the Lo 


J. Low Power Electron. Appl. 2018, 8(3), 30; https://doi.org/10.3390 
/ilpea8030030 


Exponentially Adiabatic Switching in Quantum-Dot 
Cellular Automata 
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° e r me E 
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seen feature in many quantum systems! a} in 
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Ballistic Reversible Computing (rh) 


=" Original concept: A 
=" Fredkin & Toffoli’s Billard Ball Model of 
computation (“Conservative Logic,” 1982) 
= Based on elastic collisions between moving objects 
= Spawned a subfield of “collision-based computing” 
— Localized pulses/solitons in various media 
= No power-clock signals needed! B 
=" Devices operate when data signals arrive 
= The operation energy is carried by the signal itself 
= Most of the signal energy is preserved in outgoing signals 


= However, existing design concepts for ballistic computing invoke 
implicitly synchronized arrivals of ballistically-propagating signals... 


= Making this work in reality presents some serious difficulties, however: 
= Unrealistic in practice to assume precise alignment of signal arrival times 
— Thermal fluctuations & quantum uncertainty, at minimum, are always present 
= Any relative timing uncertainty leads to chaotic dynamics when signals interact 
— Exponentially-increasing uncertainties in the dynamical trajectory 


= Can we come up with a ballistic model that avoids these problems? 


Asynchronous Ballistic Reversible Computing — (fh) itm 


To avoid the problems with dynamical chaos 
that are inherent to collision-based computing, 
=» We must avoid any direct interaction between 
ballistically-propagating signals 
Instead, require temporally-localized pulses to 
arrive at distinct, non-overlapping times 
=" Device’s dynamical trajectory then becomes 
independent of the precise pulse arrival time 


= Timing uncertainty per logic stage now accumulates 
only linearly, not exponentially 
— Only occasional re-synchronization will be needed 


=" To do logic, devices now must have internal state 

No power-clock signals, unlike adiabatic designs 
= Devices simply operate whenever data pulses arrive 
= The operation energy is carried by the pulse itself 


= Most of the energy is preserved in outgoing pulses 
— Signal restoration can be carried out incrementally 


A new project has started at Sandia which aims 
to implement ABRC in superconducting circuits 
= 3-year, $1.5M internally-funded project 
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Example ABR device functions 
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ABRC in superconducting circuits i. 


=" One intriguing possible candidate implementation technology is 
to use superconducting circuits... 
= SFQ (single flux quantum, or fluxon) pulses on appropriately constructed 
superconducting transmission lines can carry info. with relatively low 
dispersion and high propagation velocity (e.g. 2/3 c) 
= Fluxons are naturally quantized by the SQUID-like circuits that produce them, 
and are naturally polarized (carry 1 bit’s worth of +/— polarization state 
information per pulse) 
— Need to select suitable ABRC primitives operating on arity-2 signals 
= Fluxons trapped in loops (SQUID-like structures) can hold data 
quiescently 
=" Generally, loops hold integer numbers of fluxons in some small range: 
wey 2, —1, O, +1, +2, ... 
= How exactly to implement the reversible interactions? 
= A 3-year, internally-funded project at Sandia has started to investigate this... 


A Very Recent Advance! Hs 


ories 


Osborn & Wustman (LPS), arxiv: 1711.04339, 1806.08011 (and RC ‘18 proceedings) 


= The circuit shown at right 
can be considered as a 2- 
terminal ABRC device for 
binary pulses (fluxons) 
= The specified function is to 
preserve or flip the polarity 
of a fluxon passing through, 
depending on device 
parameters 
= Here, the “wires” are LJ 
transmission lines 


= Major loss mechanism is 


resonant plasmon emission =" W&O’s paper also describes some 


aiid cinta “up i more complex (4-terminal) devices 
fluxon decay time is ~107 


junction switching times = Synchronous so far, but they are now 
given initial v = 0.6c. starting to explore asynchronous 


W&O’s simulation of identity/NOT Om. 


(b) -—* . w 2x 3x (c) 
a | aaa 


= Direct numerical integration of 
JJ circuit’s equations of motion 
= Lagrangian: 
c= (S) » Ging 4 F Say] 
- (3) pom ~ cond) +190 ond) 


— 50 AA)? + L202) 


=" Gives a discrete approximation to 4 (b) -2e-" 0 = 
sine-Gordon equation: 
b—c*o" + wf sing = 0 
= Scattering interaction at 
interface is nearly elastic 
= Loss in fluxon velocity of only 4% 
= Loss in energy of 2.1-2.5% 


-10  Ox/y10 -10 0 x/y 10 0 24x 


WRSPICE simulations of discrete LJJ 


Collab w. Lewis, Missert, Wolak & Henry @ Sandia 


" ASC ’18, 10.1109/TASC.2019.2904962 


= Modeled buildable test structures in XIC 


Sandia 
National 
Laboratories 


7.845pH 7.845pH 


ics=1.5uA 


=" Confirmed ballistic fluxon propagation 


7.B4SpH 7.845pH 


=" Confirmed 
predicted dUJ 
line impedance 
of 16 
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| | An example “baby step” towards 
inventing a better SC logic family... 


= The following is in the nature of a small, Problem: Design a Ballistic 
concrete research challenge problem: Reversible Memory Cell 


=» Asacommunity, can we solve the following 
superconducting circuit design exercise? 


< chee leat Input Output 
Eit er ind a so ution, or prove rigorously Syndrome Syndrome 
that it’s impossible under the given 
constraints Hl) > (41 
+1(-1) -— @GI1)-1 
Moving Stationary ee = ae 
fluxon SFQ 
/ Some planar, reactive SCE circuit with a continuous 


superconducting boundary (to be designed) 
_-” + Onlycontains L’s, M’s, C’s, and unshunted JJs 


Ballistic interconnect (PTL or LJJ) 
* Conserves total flux, ideally nondissipative 


Desired circuit behavior (NOTE: conserves flux, 
respects T symmetry & logical reversibility): 
* If polarities are opposite, they are swapped (shown) 
¢ If polarities are identical, input fluxon reflects 
back out with no change in polarity (not shown) 
¢ Elastic scattering type interaction: Fluxon kinetic 
energy is (almost entirely) preserved 


es F 


Sandia 
Conclusion (he 
= Amature reversible computing technology is a prerequisite if we wish to 
sustain practical performance growth of digital systems over the long term 
= This is guaranteed by irrefutable facts of fundamental physics... 
= However, the engineering of fast & thermodynamically efficient physical 
implementations of reversible computing is a field that still very much in its 
infancy, and, as a research area, is still extremely poorly organized... 


= Far, far more focused work is needed in key areas such as novel device physics for RC, 
resonator design for adiabatic circuits, and elastic circuits for ballistic computing... 
= The mainstream electronics industry has, historically, not appeared interested in even 
attempting to tackle any of these kinds of engineering problems... 
— Perhaps due to a misperception that approaching RC is too difficult, or even impossible? 
= The rate of progress would likely be significantly increased by: 
=" Improved understanding of the fundamental physics of reversible computing 
=" Working demonstrations of useful computations at very low energy dissipation levels 
= Important: While taking the power supply into account! 
= Workshops in key underdeveloped research areas such as reversible device physics 
= Increased support for basic physics & engineering research for reversible computing 
= |would advise funding agencies to dedicate substantial resources to R&D in 
these areas, if they ever want a reversible computing revolution to happen... 
= |t’s definitely not going to happen if everyone just sits around and waits for it! 


Thank you... Questions? 


Never, never, 


never give up. 
v4 ad P 


Be ashamed to die until you nave 
won some victory for Numanity. 


Horace Mann 


Steve Jobs 
1955-2011 


“The ones who are crazy enough to think 
that they can change the world, 
are the ones who do.’ 
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